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Abstract 

^Sj ' A limit order book provides information on available limit order prices and their volumes. 

Based on these quantities, we give an empirical result on the relationship between the bid- 
ask liquidity balance and trade sign and we show that liquidity balance on best bid/best 
ask is quite informative for predicting the future market order's direction. Moreover, we 
define price jump as a sell (buy) market order arrival which is executed at a price which is 
I smaller (larger) than the best bid (best ask) price at the moment just after the precedent 

\^ ' market order arrival. Features are then extracted related to limit order volumes, limit order 

price gaps, market order information and limit order event information. Logistic regression 
, is applied to predict the price jump from the limit order book's feature. LASSO logistic 

^ I regression is introduced to help us make variable selection from which we are capable to 

highlight the importance of different features in predicting the future price jump. In order 
to get rid of the intraday data seasonality, the analysis is based on two separated datasets: 
morning dataset and afternoon dataset. Based on an analysis on forty largest French stocks 
of CAC40, we find that trade sign and market order size as well as the liquidity on the best 
bid (best ask) are consistently informative for predicting the incoming price jump. 



00 ; Introduction 



o 



The determination of jumps in financial time series already has a long history as a chal- 
lenging, theoretically interesting and practically important problem. Be it from the point 
^ ■ of view of the statistician trying to separate, in spot prices, those moves corresponding to 

"jumps" from those who are compatible with the hypothesis of a process with continuous 
paths, or from the point of view of the practitioner: market maker, algorithmic trader, ar- 
^ ■ bitrageur, who is in dire need of knowing the direction and the amplitude of the next price 

■ change, there is a vast, still unsatisfied interest for this question. Several attempts have 

been made at theorizing the observability of the difference between processes with continu- 
ous or discontinuous paths, and the major breakthrough in that direction is probably due to 
Barndorff-Nielsen and Shephard [3], who introduced the concept of bi-power variation, and 
showed that - in a nutshell - the occurrence of jumps can be seen in the limiting behavior as 
the time step goes to zero of the bi-power variation: for a process with continuous paths, this 
quantity should converge to (a multiple of) the instantaneous variance, and the existence of 
a possibly different limit will be caused by the occurrence of jumps. 

Since then, many authors, in particular Ait-Sahalia and Jacod (2009) [2] have contributed 
to shed a better light on this phenomenon, and one can safely say that rigorous statistical 
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tests for identifying continuous-time, real-valued processes with discontinuous paths are now 
available to the academic community as well as the applied researcher. 

However, it is a fact that the physics of modern, electronic, order-driven markets is not 
easily recast in the setting of real-valued, continuous-time processes, and it is also a fact 
that the time series of price, no matter how high the sampling frequency, is not anymore the 
most complete and accurate type of information one can get from the huge set of financial 
data at our hands. In fact, a relatively recent trend of studies has emerged over the past 
10 years, where the limit order book became the center of interest, and the price changes 
are but a by-product of the more complicated set of changes on limit orders, market orders, 
cancellation of orders, ... see e.g. Chakraborti et al. (2009) [6J, Abergel et al. (2011) [1] for 
the latest developments in the econophysics of order-driven markets. This new standpoint is 
quite enlightening, in that the physics of price formation becomes much more apparent, but 
it calls for a drastic change in the basic modeling tools: prices now live on a discrete grid 
with a step size given by the tick, the changes in price occur at discrete times. Furthermore, 
a host of important events that affect the order book rather than the price itself, events 
which are therefore essential in understanding the driving forces of the price changes, now 
become observable, and their role in the price dynamics must be taken into account when 
one is interested in understanding the latter. 

Our point of view is slightly different: rather than concentrate on the one-dimensional 
price time series, we want to model the dynamics in event time of the whole order book, 
and focus on some specific events that can be interpreted in terms of jumps. To do so, we 
shall depart from the classical definitions - if any such thing exists - of a jump in a financial 
time series, and restrict ourselves to the more natural, more realistic and also more prone 
to experimental validation, concept of a inter-trade price jump and trade-through. 

By definition, an inter-trade price jump is defined as an event where a market order is 
executed at a price which is smaller (larger) than the best limit price on the Bid (Ask) just 
after the precedent market order arrival. An inter-trade price jump permits a limit order 
submitted at the best bid (best ask) just after a market order arrival to be surely executed 
by the next market order arrival. A trade-through corresponds to the arrival of a new market 
order, the size of which is larger than the quantity available at the best limit on the Bid 
(for a sell order) or Ask (for a buy order) side of the order book. By nature, such an order 
will imply an automatic and instantaneous price change, the value of which will be exactly 
the difference in monetary units between the best limit price before and after transaction on 
the relevant side of the order book. Trade-through can be interpreted as the instantaneous 
price change triggered by a market order, meanwhile, inter-trade price jump is post-trade 
market impact. Most of researches on limit order book are based on stocks and often relates 
to characterizing features such as liquidity, volatility and bid-ask spread instead of making 
prediction, see Hasbrouck (1991) [10], Hausman et al. (1992) |12) . Keim and Madhavan 
(1996) [H], Lo et al. (2002) [H], Lillo et al. (2003) [E], Hasbrouck (2006) P], Parlour 
and Seppi (2008) [E] and Jondeau et al. (2008) and Linnainmaa and Rosu (2009) [16]. 
Trade-through has also been the object of several recent studies in the econometrics and 
finance literature, see e.g. Foucault and Menkveld (2008) [8] (for cross-sectional relationship 
study) and Pomponio and Abergel (2011) [T9] . 

In this work, we investigate whether the order book shape is informative for the inter- 
trade price jump prediction and whether trade-through contributes to this prediction. Re- 
cently, many researchers propose machine learning methods to make prediction on limit 
order book. Blazejewski and Coggins (2005) [4] present a non-parametric model for trade 
sign (market order initiator) inference and they show that limit order book shape and his- 
torical trades size are informative for the trade sign prediction. Fletcher et al. (2010) [7] 
applied multi kernel learning with support vector machine in predicting the EURUSD price 
evolution from the limit order book information. Here, logistic regression is introduced to 
predict the occurrence of inter-trade price jump. Variable selection by lasso logistic regres- 
sion provides us an insight into the dynamics of limit order book and allows us to select the 
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most informative features for predicting relevant events. We will show that some features of 
the limit order book have strong predictive and explanatory power, allowing one to make a 
sound prediction of the occurrence of inter-trade price jump knowing the state of the limit 
order book. Trade-through is also confirmed to be quite informative for inter-trade price 
jump prediction. This result in itself is interesting in that it allows one to use the full set of 
available information in order to do some prediction: whereas the history of the price itself 
is known not to be a good predictor of the next price moves - the so-called efficiency of the 
market is relatively hard to beat when one only uses the price information - we shall show 
that the limit order volumes contain more information, and the market order size contributes 
also to an accurate prediction of inter-trade price jump. 

This paper is organized as follows. Section [T] describes the main notations in limit order 
book. Section [2] gives an empirical result on the relationship between Bid- Ask liquidity 
balance and trade sign. Section [3] introduces logistic regression for inter-trade price jump 
prediction and lasso logistic regression for variable selection. The conclusion is in Section 
??. 



1 Description and data notation 

The Euronext market adopts NSC (Nouveau Systeme de Cotation) for electronic trading. 
During continuous trading from 9/i00 to 17/i30, NSC matches market orders against the 
best limit order on the opposite side. Various order types are accepted in NSC such as limit 
orders (an order to be traded at a fixed price with certain amount), market orders (order 
execution without price constraint), stop orders (issuing limit orders or market orders when 
a triggered price is reached) and iceberg orders (only a part of the size is visible in the 
book). Limit order is posted to electronic trading system and they are placed into the book 
according to their prices, see Figure [TJ Market order is an order to be executed at the best 
available price in limit order book. The lowest price of limit sell orders is called Best Ask; 
the highest price of limit buy orders is called Best Bid. The gap between the Best Bid and 
the Best Ask is called the Spread. When a market buy order with price higher /equal than 
the best ask price, a trade occurs and the limit order book is updated accordingly. Limit 
orders can also be cancelled if there have not been executed, so the limit order book can 
be modified due to limit order cancellation, limit order arrival or market order arrival. In 
case of iceberg orders, the disclosed part has the same priority as a regular of limit order 
while the hidden part has lower priority. The hidden part will become visible as soon as the 
disclosed part is executed. The case that the hidden part is consumed by a market order 
without being visible before is quite rarely. In this study, we neglect stop orders and iceberg 
orders which are relatively rare compared to limit order and market order events. 

In a limit order book, as shown in figure [U only a certain number of best buy/sell limit 
orders are available for public. We denote the number of available bid/ask limit prices by L. 



Limit order boolc 




Figure 1: Limit Order Book description. Limit order price is discretized by tick price. 



In this study, for simplicity, we focus on limit order arrival events, limit order cancellation 
events and market order arrival events, see Figure El The number of visible limit order levels 
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is chosen to be five L = 5. Our dataset is provided by NATIXIS via Thomson Renter's 
'Reuters Tick Capture Engine' and comprises of trades and hmit order activities of the 40 
member stocks of index CAC40 between April 1st 2011 and April 30th 2011. In order 
to get rid of open hour and close hour, we extract the data from 09/i05 to 17/i25. Every 
transaction and every limit order book modification are recorded in milliseconds. The data 
contains information on the L best quotes on both bid and ask sides. The trade data and 
quotes data are matched. 

Limit Order Bool< Dynamics 



o Bid 




o Ask 
A Mid 



\ ' ' I i i \ i I 

2 4 6 8 10 12 
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Figure 2: Dynamics of limit order book. The first event is a trade-through event where a market order 
consumes 60 stocks at the bid side, then a new ask limit order of size 20 arrives in the Spread. Successively, 
a cancellation at the best bid price follows and the precedent second best bid price emerges the best bid 
price, the a regular market order triggers a transaction of size 60. 

Denote t as a time index indicating all limit order book events. P^'^ and P°'* for i = 
1, • • • ,L define the i^^ best log bid/ask quote instantaneously after the t*'* event. We denote 
St = Pf'^ — Pf'^ the spread instantaneously after the t^^ event. G\'^ = P^'^ — P^'^~^^, 
Qa,i _ pa,i+i _ pa,i ^ _ ^ . . . ^ ^ _ 1 define respectively the i*'* best bid(ask) limit price 

gap instantaneously after the t^^ event. Besides, V^'^ and V^""'^ for i = 1, • • • , L denote the 
log limit order volume on the i^^ best bid/ask quote instantaneously after the i*'* event. The 
volume of trade is denoted by Vl^° (y^° = when there is no trade) and the price of trade is 
denoted by (-P/"° = when there is no trade, PJ^" = P^'^ when a market order touches 
bid side and Pl^° = P^'^ when a market order touches ask side). Moreover, we introduce six 
dummy variables BLOt, ALOt, BMOt, AMOt, BTTt and ATTt to indicate the direction 
of each event : bid side or ask side, respectively for limit order event {BLOt and ALOt), 
market order event (BMOt and AMOt) and trade-through event {BTTt and ATTt). The 
definition of variables is detailed in Table [TJ 

In order to capture the high-frequency dynamics in quotes and depths, we define a K- 
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Table 1: Variable definitions 



Variable 


Description 




the z best log bid price just aiter the r event 




the z*'' best log ask price just after the t*'' event 




the bid gap price just after the t*'* event 




the spread just after the t*'' event 




the i*'' ask gap price just after the i*'' event 




log volume of the i*^ best bid quote just after the i*'' event 




log volume of the z*'* best ask quote just after the t*'' event 


BLOt 


dummy variable equal to 1 if the t*'' event is a limit order event at bid side 


ALOt 


dummy variable equal to 1 if the t*'* event is a limit order event at ask side 


BMOt 


dummy variable equal to 1 if the i*'' event is a market order event at bid side 


AMOt 


dummy variable equal to 1 if the i*'' event is a market order event at ask side 


BTTt 


dummy variable equal to 1 if the i*'' event is a trade-through event at bid side 


ATTt 


dummy variable equal to 1 if the t*'* event is a trade-through event at ask side 



dimensional vector 

t - [^i ,>^t,(-'t ,---,Vt , n J. 

Modelling log prices and log volumes instead of absolute values is suggested by Potters and 
Bouchaud (2003) [20] studying the statistical properties of market impacts and trades and 
can be found in many other empirical studies. Price and volume changes in log is interpreted 
as related changes in percentage. 

Another vector of variables is denoted by 

r2 = [BMOt, AMOt, BLOt, ALOt, BTTt, ATTt] , 

indicating the nature of the t^^ event. 

Table [5] provides a descriptive statistics of the data used in this paper. It comprises 
limit order events, market order events and inter-trade price jump events. The analysis is 
done on two separated datasets: morning dataset (betvi^een 09/i05 and 13/il5) and afternoon 
dataset (between 13/il5 and 17/i25). We observe that there are more market order events 
in the afternoon than in the morning. Similarly, inter-trade price jump events are slightly 
more frequent in the afternoon than in the morning. However, trade-through events are more 
frequent in the morning than in the afternoon. 



2 Empirical facts : Bid- Ask liquidity balance and 
trade sign 

Before making an analysis on price jump prediction, we try to reveal whether the limit order 
volume information plays a role in determining the future market order's direction (trade 
sign). In order to study the conditional probability given the knowledge about bid/ask limit 
order liquidity, we propose a Bid- Ask volume ratio corresponding to depth i just before the 
k*'^ trade, which is defined as Wt^-i{i) (i € {1, . . . ,L}), more precisely. 



E;=iexp(<'i,) 
V;^,exp(T^:^,) 



Wt^-i(i) = log 



log I Y.exp{V,'^U)j - log [TexpiV^y:,) ] , (1) 
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Table 2: Summary of limit order events, market order events and inter-trade price jump events, CAC40 
stocks, April, 2011. 



Stock 


#LO 


#MO 


#BidJump 


#Ask,Jump 


#BidTT 


#AskTT 


AM 


PM 


AM 


PM 


AM 


PM 


AM 


PM 


AM 


PM 


AM 


PM 


ACCP.PA 


40505 


44788 


906 


1067 


120 


125 


121 


125 


35 


31 


37 


36 


AIRP.PA 


65775 


83199 


1358 


1715 


190 


257 


201 


264 


55 


66 


49 


55 


ALSO.PA 


92410 


102069 


1319 


1590 


165 


199 


177 


201 


48 


50 


43 


39 


ALUA.PA 


141048 


110173 


1900 


2379 


202 


233 


214 


242 


84 


96 


84 


105 


AXAF.PA 


156049 


131722 


1694 


1951 


155 


167 


160 


158 


31 


29 


35 


26 


BNPP.PA 


432091 


236054 


3164 


3160 


377 


368 


386 


372 


104 


77 


108 


74 


BOUY.PA 


28864 


36112 


973 


1227 


103 


140 


114 


154 


29 


33 


31 


30 


CAGR.PA 


133449 


86078 


1795 


1645 


177 


129 


178 


124 


39 


22 


45 


23 


CAPP.PA 


27679 


30846 


993 


1262 


125 


164 


115 


142 


40 


41 


37 


40 


CARR.PA 


111559 


104513 


1536 


1799 


186 


244 


171 


236 


45 


38 


53 


43 


CNAT.PA 


25216 


29859 


1144 


1228 


142 


147 


125 


126 


39 


31 


45 


32 


DANO.PA 


124929 


106412 


1618 


2160 


188 


260 


188 


267 


57 


60 


50 


48 


EAD.PA 


38720 


35618 


772 


983 


98 


102 


72 


87 


23 


23 


18 


19 


EDF.PA 


151715 


75212 


1881 


1750 


190 


158 


182 


153 


62 


33 


66 


38 


ESSI.PA 


21678 


31743 


528 


751 


55 


66 


55 


70 


13 


15 


10 


9 


FTE.PA 


78140 


83328 


1370 


1710 


82 


85 


67 


77 


16 


12 


14 


12 


GSZ.PA 


145293 


105185 


1781 


2052 


190 


217 


160 


189 


48 


43 


37 


34 


ISPA.AS 


165149 


170835 


1877 


2663 


205 


294 


216 


288 


51 


76 


61 


72 


L AFP. PA 


116640 


87808 


1149 


1464 


167 


213 


177 


217 


61 


54 


54 


52 


LVMH.PA 


80949 


84063 


1006 


1181 


74 


63 


78 


62 


15 


12 


15 


12 


MICP.PA 


101712 


65052 


1483 


1564 


200 


182 


197 


189 


68 


46 


55 


45 


OREP.PA 


33981 


40395 


1182 


1393 


152 


187 


154 


193 


44 


40 


42 


39 


PERP.PA 


65502 


43922 


971 


1154 


133 


124 


136 


141 


31 


22 


30 


22 


PEUP.PA 


51684 


57536 


1166 


1258 


137 


135 


133 


132 


44 


30 


43 


33 


PRTP.PA 


29682 


31908 


539 


627 


36 


34 


44 


36 


6 


6 


5 


5 


PUBP.PA 


71049 


52461 


1093 


1350 


136 


144 


134 


144 


36 


35 


36 


31 


RENA.PA 


136579 


96872 


1766 


1843 


242 


213 


262 


234 


86 


59 


103 


75 


SASY.PA 


111349 


94709 


1709 


2221 


153 


170 


143 


184 


37 


39 


37 


36 


SCHN.PA 


100690 


82453 


1297 


1397 


97 


78 


92 


86 


23 


12 


23 


16 


SEVI.PA 


33050 


29122 


619 


659 


72 


57 


68 


55 


17 


12 


16 


11 


SGEF.PA 


94111 


65542 


1454 


1624 


178 


198 


169 


199 


59 


53 


50 


38 


SGOB.PA 


158051 


148326 


1931 


2336 


240 


308 


245 


311 


75 


83 


73 


78 


SOGN.PA 


285430 


172865 


3455 


3317 


496 


475 


450 


456 


182 


150 


183 


137 


STM.PA 


96566 


94170 


1148 


1367 


172 


213 


168 


212 


59 


52 


63 


59 


TECF.PA 


77319 


70533 


1276 


1405 


163 


172 


161 


169 


51 


42 


45 


35 


TOTF.PA 


287449 


224132 


2388 


3228 


308 


396 


301 


418 


81 


101 


80 


89 


UNBP.PA 


41956 


40961 


612 


755 


63 


68 


61 


63 


8 


9 


8 


8 


VIE.PA 


49086 


52755 


936 


1074 


106 


105 


108 


104 


19 


15 


22 


20 


VIV.PA 


70356 


73725 


1478 


1822 


130 


151 


122 


143 


31 


24 


31 


22 


VLLP.PA 


45336 


37801 


1198 


1471 


167 


208 


172 


195 


58 


62 


54 


50 
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where is time index of the fc*'* market order event. 

For all X G M+, the conditional probability of a future buy market order (positive trade 
sign) that the next trade is triggered by a buy market order given Vt^, (i) > a; is defined as, 



(2) 



ts 



where we denote the TradeSign at time by /j 

Similarly, for all x E M_(_, the conditional probability of a future sell market order (neg- 
ative trade sign) that the next trade is triggered by a sell market order given Vtj.(i) > x is 
defined as, 



(3) 



Figure 3: The conditional probability of a buy market order vs bid-ask volume ratio, April, 2011. 



Conditional probability of TladeSIgn, buy marfcet order, BNPP.PA 



Conditional probability of TradeSign, buy market order, SOGN.PA 





Bid-ask volume re 



Bid-ask volume r; 



Conditional probability of TradeSign, buy market order, CAGR.PA 



Conditional probability of TradeSign, buy market order, CNAT.PA 





Bid-ask volume 



Bid-ask volume ratio 



The relationship between P (/*^ = l\Wt^-i{i) > x) and x for i € {1, . . . , L} is shown 
in Figure O and |H We observe that the conditional probability of the next trade sign is 
highly correlated with the Bid-Ask volume ratio corresponding to depth 1. Nevertheless, 
the dependance between the conditional probability of the next trade sign and the Bid- 
Ask volume ratio corresponding to depth larger than 1 is much more noised. Figure [8] in 
Appendix shows the relationship between P (/|^ = l|Wfj._i(l) > x) and x for all stocks of 
CAC40. It is worth remarking that the trade sign's conditional probability reaches 0.80 in 
average when the liquidity on the best limit prices is quite unbalanced. 
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Figure 4: The conditional probability of a sell market order vs bid-ask volume ratio, April, 2011. 



Conditional probability of TradeSIgn, seil market order, BNPP.PA 



Condltlonai probabliity of TVadeSign, seil market order, SOGN.PA 




Depth=1 

Depth=2 

— Depth=3 

Deplh=4 

Depth=5 




Bid-ask volume re 



Bid-ask volume r; 



Conditional probabliity of TradeSign, seil market order, CAGR.PA 



Conditional probability of TradeSIgn, seil market order, CNAT.PA 





Bid-ask volume 



Bid-ask volume ratio 



3 Logistic regression analysis and variable selec- 
tion by LASSO 

3.1 Logistic regression analysis 

The result shown in the previous section reveals that Bid-Ask liquidity balance provides 
important information on the incoming market order. In this section, we introduce the 
standard logistic regression to predict the inter-trade price jump occurrence and use LASSO 
select regularization to evidence the importance of each variable in this prediction. 

We denote the number of market order events by and for each i G {1, • • • , N}, Xj = 
[l,y,-°,Ri ,R,\_i,-- - ,R,\_^+i,R2,R2 ,R2_„^,] {X G R(P+^)x\ p = m{2L - 1) + 

6n) the explanatory variables summarizing the available order book information when the 
t^^ event is a market order event, is a binary variable indicating whether the event is an 
bid/ask inter-trade price jump, yi is defined as follows, 



Bid side inter-trade price jump indicator: Yi 



otherwise 



(4) 



or 



Ask side inter-trade price jump indicator: Yi 



1, ifPt7:,>p,: 

0, otherwise 



,a,l 



(5) 
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In the logistic model, the probability of the bid/ask inter-trade price jump occurrence is 
assumed to be given by: 



T 

Observing that for i = 1 



where /3 = [/3o, A, • • • ,/?p] 



Wt,^,(i) = V,':\-V,l'\, (7) 



we see that the linearity of the conditional probability Pf3{Y = 1|X) on variables V^^^ and 
V^'^ in Equation [H] allows us to capture the contribution of Wt^. (?) in the prediction. 

The parameters f3 are unknown and should be estimated from the data. We use the 
maximum likelihood to estimate the parameters. It is well known that the log-likelihood 
function given by 

N 

£(/3) = {log(l + e^"""') - y^/S^Xj . (8) 

1=1 

The likelihood function is convex and therefore can be optimized using a standard opti- 
mization method. 



3.2 Variable selection by LASSO 

Since the number of explanatory variables p being quite large, it is of interest to perform 
a variable selection procedure to select the most important variables. A classical variable 
selection procedure when the number of regressors is large is the LASSO procedure see Hastie 
et al (2003) [11]. Instead of using a BIC penalization, the LASSO procedure adds to the 
likelihood the norm of the logistic coefficient, which is known to induce a sparse solution. 
This penalization induces an automatic variable selection effect. 
The LASSO estimate for logistic regression is defined by 

N / ^ \ 

^lasso^^-^ = argmin^n-log(l + e^''^0+>^./3'^Xi)+A^|/3j| . (9) 
^ i=i \ j=i J 

The constraint on X]j=i \f^j\ ™akes the solutions nonlinear in the yi and there is no closed 
form expression as in ridge regression. Because of the nature of constraint, making A suffi- 
ciently large will cause some of the coefficients to be exactly zero. Germain et Roueff (2009) 
[9] gives the uniform consistency and a functional central limit theorem for the LASSO 
regularization path for the general linear model. 



3.3 Results 

Choosing L = 5, m = 5 and n = 5, the dimension of limit order book's profile vector is 
p=l + m{2L - 1) + 6ra = 76. 

The parameter A in LASSO is estimated by cross-validation, then we calculate AUG 
value (area under ROG curve) to measure the prediction quality. A ROG (receiver operating 
characteristic) curve is a graphical plot of the true positive rate vs. false positive rate. The 
area under the ROG curve is a good measure to measuring the model prediction quality. The 
AUG value is equal to the probability that a classifier will rank a randomly chosen positive 
instance higher than a randomly chosen negative one. 

We show the out-of-sample AUG value in Figure [H The stocks are sorted in alpha- 
betic order. We see that for each prediction task, the AUG value is around 0.80 and it is 
consistently high over all datasets and all stocks of CAC40. 
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Figure 5: AUG value, price jump prediction, CAC40, April, 2011. 



CAC40, morning dalaset, BidJump 



CAC40, morning datasel, Asl(Junip 



ij 



ALUfl.PA AXAF.PA CARfi.PA EDF.Rft ISPA.AS PEHP.PA SA3Y.PA SOGN.m VIEPA 



ACCP.PA BNPP.m CNAT.PA ESSI.PA LAFP.PA PEUP.PA SCHN.PA STM.PA VIEPA 



CAC40, afternoon datasel, BidJump 



CAC40, afternoon dataset, Asl(Jump 



bli''' 



ACCP.PA BNPP.PA CNAT.m ESSI.PA LAFP.PA PEUPPA SCHN.PA STM.PA VIE.PA 



ACCP.PA BNPP.m CNAT.PA ESSI.PA LAFP.PA PEUPPA SCHN.PA STM.PA VIEPA 



CAC40, allday dataset, BidJump 



CAC40, aiiday datasel, AsliJump 



ALUAPA AMF.PA CARRPA EDFPA ISPA.AS PEHP.ffl SASYPA SOGNPA VIEPA 



ACCPPA BNPPPA CNATPA ESSlPA LAFPPA PEUPPA SCHNPA STMPA VIEPA 



In order to discover the contribution of each variable to the prediction, we add an analysis 
on the five firstly selected variables for each prediction task of all stocks of CAC^O (with 
allday dataset). 

Figure [6] and [7] show how many times a variable is selected as the first (second, third, 
forth, fifth) selected variable by LASSO. We denote the i events lagged log volume on the j*^ 
bid (ask) limit price by VBi_j (VAjJ,). Similarly, i events lagged log market order volume 
is denoted by VMOJ, and i events lagged binary variables are denoted by BMOji, AMO-i, 
BTTJ, ATT J, etc. For the sake of simplicity, for each selection order i {i & {1, . . . ,5}), 
we show the frequency distribution of the five most frequently selected variables among 746 
backtests in each figure. 

We observe that VBIq, BMOq and VMOq are the most selected variables for predicting 
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the future bidside inter-trade price jump and that VAIq, AMOq and VMOq are the most 
selected variables for predicting the future askside inter-trade price jump. In contrast, trade- 
through is less informative and contributes few to the price jump prediction. It implies 
that the market order is sensitive to the liquidity on the best limit price. As soon as 
the liquidity on the best limit price becomes significantly low, the next market order may 
touch it immediately. The information provided by BMOq (AMOq) and VMOq recalls 
the phenomena of long memory of order flow, see Bouchaud et al. (2008) [5j. When a 
trader tries to buy or sell a large quantity of assets, he may split it into small pieces and 
execute them by market order successively. Consequently, precedent market order direction 
contributes to predict the next market order event. 



Figure 6: Variable selection for BidJump prediction, CAC40, April, 2011. From left to right, from top to 
bottom, each figure shows how many times a variable is selected as the k^^ selected variable by LASSO, 
k = {l,...,5}. 



The first selected variable, allday dataset, BidJump 



The second selected variable, altday dataset, BidJump 




VB1_0 VMO_0 BMO_0 VB2_0 VA1_0 



BMO_0 VMO_0 VB1_D VA1_0 VB2_0 

Variable 



The third selected variable, allday dataset, BidJump 



The forth selected variable, allday dataset, BidJump 



VMO_0 BMO_0 VA1_0 VB2_0 VB1_0 

Variable 



s 
I 



BMO_0 VMO_0 VA1_0 VB2_0 

Variable 



The fifth selected variable, allday dataset, BidJump 



VMO_0 VB2_0 VA1_D BMO_0 BTT_0 

Variable 
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Figure 7: Variable selection for AskJump prediction, CAC40, April, 2011. From left to right, from top to 
bottom, each figure shows how many times a variable is selected as the k^^ selected variable by LASSO, 
fc = {l,...,5}. 



The first selected variable, allday dataset, AskJump 



The second selected variable, allday dataset, AskJump 



VA1_0 VMO_0 AMO_0 VA2_0 VB1_0 

Variable 



VMO_0 AMO_0 VA1_0 VB1_0 VA2_0 

Variable 



The third selected variable, allday dataset, AskJump 



The lorth selected variable, allday dataset, AskJump 



AMO_0 VMO_0 VB1_0 ATT_0 VA2_C 



AMO_D VMO_0 VA2_0 VB1_0 ATT_0 



The fifth selected variable, allday dataset, AskJump 



VMO_0 VB1_0 VA2_0 AMO_0 AMO_1 

Variable 



Conclusion 

In this paper, we provide an empirical result on the relationship between bid-ask limit order 
liquidity balance and trade sign and an analysis on the prediction of the inter-trade price 
jump occurrence by logistic regression. We show that limit order liquidity balance on best 
bid/best ask is informative to predict the next market order's direction. We then use limit 
order volumes, limit order price gaps and market order size to construct limit order book's 
feature for the prediction of inter-trade price jump occurrence. LASSO logistic regression 
is introduced to help us identify the most informative limit order book features for the 
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prediction. Numerical analysis is done on two separated datasets : morning dataset and 
afternoon dataset. LASSO logistic regression gets very good prediction results in terms of 
AUG value. The AUG value is consistently high on both datasets and all stocks whatever 
the liquidity is. This good prediction quality implies that limit order book profile is quite 
informative for predicting the incoming market order event. The variable selection by LASSO 
logistic regression shows that several variables are quite informative for inter-trade price jump 
prediction. The trade sign and market order size and the liquidity on the best limit prices are 
the most informative variables. Nevertheless, the aggressiveness of market order, measured 
by trade-through, has less important impact than we had expected. These results confirm 
that the limit order book is quite sensitive to the liquidity on the best limit prices and there 
is a long memory of order flow like what is shown by other authors. This paper is merely a 
first attempt to discover the information hidden in limit order book and further studies will 
be needed to understand better the full dynamics of limit order book. 
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